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Abstract 

k> ' The least squares problem is formulated in terms of £p quasi-norm regularization (0 < p < 1). Two formulations 

^ , are considered: (i) an ^p -constrained optimization and (ii) an ^p-penalized (unconstrained) optimization. Due to the 

_ ^.' nonconvexity of the £p quasi-norm, the solution paths of the regularized least squares problem are not ensured to be 

continuous. A critical path, which is a maximal continuous curve consisting of critical points, is therefore considered 
separately. The critical paths are piecewise smooth, as can be seen from the viewpoint of the variational method, 
and generally contain non-optimal points such as saddle points and local maxima as well as global/local minima. 
Along each critical path, the correspondence between the regularization parameters (which govern the 'strength' of 
regularization in the two formulations) is non-monotonic and, more specifically, it has multiplicity. Two paths of 
critical points connecting the origin and an ordinary least squares (OLS) solution are highlighted. One is a main 
path starting at an OLS solution, and the other is a greedy path starting at the origin. Part of the greedy path can 
be constructed with a generalized Minkowskian gradient. The breakpoints of the greedy path coincide with the 
step-by-step solutions generated by using orthogonal matching pursuit (OMP), thereby establishing a direct link 
between OMP and £p -regularized least squares. 



This work was partially supported by a JSPS Grant-in-Aid (24760292). A preliminary version of this work was presented at the IEEE 
International Symposium on Information Theory (ISIT), 2012 [1]. 



I. Introduction 

The present paper addresses the least squares problem by giving two different formulations for the £p quasi-norm 
(0 < p < 1) regularization. We will use a simple linear system model: 

y:=[yi,y2r-- ,ydV = x''(3, + v gR'', (1) 

where X := [xi x^ • • • x^^ G M"^*^ is a known matrix with its columns being the design variables, (5^ G M" 
consists of the (unknown) explanatory parameters, and d € M'^ is the noise vector The first formulation under 
£p -regularization for p > is as follows:^ 

1 II -^ ||2 

(7^^) minimize/3eR-^ ^ifi) ■= ::\\X (3- y\\ 

1 " 

subject to Fp{(3) := - ||/3||^ = J] Ml^i) < c, (2) 

where c > 0, ||-|| denotes the £p (quasi-)norm for any p > 0, and ippiP) '■= ~ \l^f^ /3 G M. Problem {Vc) is referred 
to as the ip-constrained least squares problem. The second formulation is as follows: 

[CD minimize;36M. /a(/3) := <^(/3) + AFp(/3), (3) 

where A > is a Lagrange multiplier Problem (£^) is referred to as the ip-penalized least squares problem. 

Both problems for p < 1 are related closely to sparse optimization problems encountered in various applications 
and have therefore been studied extensively. In the context of sparse signal recovery or compressed sensing [2-4], 
underdetermined systems (n » d) are assumed, and the object is to recover a sparse unknown vector from a 
small number of measurements. In the context of model selection [5,6], it is desired to select variables based on 
a sufficiently large number (or sometimes a small number) of measurements. In the case of p = 1, Fi (i.e., the 
£i norm) is a convex function, and it is widely known that (V^) and {C\) are equivalent in the sense that the 
solutions of these problems coincide to each other (and also that there is a continuous monotone correspondence 
between c and A). In this case, {V^) is referred to as a Lasso [5]. The least angle regression (LARS) algorithm 
has been proposed [6] for constructing the solution path of {V^) with the value of c sliding from zero to infinity. 
Although LARS has been mainly studied in connection with overdetermined systems [6], it has also been applied 
to underdetermined systems (see [7]). 

The ip norm becomes closer to the £q norm as p approaches zero, although Fp is a nonconvex function for 
p < 1. Considerable effort has therefore been devoted to the least squares problem formulated in terms of £p norm 
regularization for p < 1 [8-16]. It has been shown experimentally that the use of the ip norm yields a sparser 
solution and a lower prediction error for model selection compared with the ii norm [16]. It has also been proven 
that fewer measurements as well as weaker conditions are enough for sparse signal recovery [10, 15, 16]. It is, 
therefore, important to see whether equivalence between (Vc) and (£^) holds even for p < 1 and, if not, how the 
equivalence is modified. As yet however, this fundamental question has not been investigated. 

In this paper, we shed light on this hitherto uninvestigated question through an extension of LARS to the 
nonconvex case of p < 1. As expected, the case of p < 1 is significantly different from the case of p = 1 due to 
the nonconvexity of Fp. We prove that the solutions (i.e., the global minima) of (Vc) and (£^) are different for 
p < 1. However, there is a remarkable correspondence between the critical points of (Vc) and (>C^). The present 
paper studies the critical paths of the two problems and elucidates their structures. The main body of the paper 
consists of three parts. In the first part, we study the solution paths (the paths of global minima) of the problems 
(Vc) and (£^) with the parameters c and A sliding from zero to infinity and show that the two paths are different 
from each other The solution of (Vc) for c = is obviously the zero vector, and as c increases continuously, the 
solution moves away from the origin continuously. Indeed, the behavior of the (Vc) solution path in the vicinity 
of the origin is homotopically the same as that of the {V^) solution path. On the other hand, the (£^) path is 
quite different. The solution of (£^) for a sufficiently large A is the zero vector and, as A decreases continuously, 
the solution jumps from the origin to a point on the (Vc) path. In short, the (£^) path is always discontinuous at 

'The formulation (2) is essentially equivalent to the following problem: (Q?) miniinize^gisn_Fp(/3) subject to >p{f3) < e for a given 
e > 0. Problem (Q^), and thus problem {Vc), for < p < 1 is a relaxation of the sparse optimization problem: (Qg) iriinimize/agR" ||/3|1q 
subject to iy3(/3) < e. Here ||-||q counts the number of nonzero entries of a vector 



the origin, whereas the (Vc) path always leaves the origin continuously. (Note, however, that the continuity of the 
whole (Vc) path is not necessarily guaranteed, as will be seen in Example A.2 of the appendix.) In addition, the 
positive semi-definiteness of the Hessian matrix of fx is a necessary and sufficient condition for local minimality 
in (>C^), but it is only sufficient for local minimality in (Vc)- As a result, the {Vc) path contains the (£^) path as 
its proper subset. 

In the second part, we enlarge the problems to fill the gap by studying the paths of critical points of (VE) and 
(£^), which include local minima/maxima and saddle points. Strictly speaking, we address the following pair of 
problems: 

CPP) find critical points of (V^); (4) 

(C^) find critical points of (C^)- (5) 

Critical points are defined by the first-order condition in their neighborhoods. There are in general multiple critical 
points corresponding to each value of c or A. A critical point can therefore be regarded as a multiple-valued 
function of c (or A). We divide the set of all critical points into a smallest number of subsets each of which forms 
a continuous curve in M" that is a single-valued function of c (or A). We call each of these curves a critical path 
of {Vc) (or (C^)), or simply a {Vc) path (or an (£^) path) for short. A remarkable difference from the case of 
p = 1 is that the correspondence between c and A has multiplicity; a single value of A corresponds to multiple 
values of c. A critical path is a piecewise smooth curve and its smooth segments are characterized by a differential 
equation in M". The support of a critical point changes at each breakpoint at which the direction of the curve 
changes discontinuously. (A breakpoint is indeed a connection point of smooth curves in a critical path.) At any 
breakpoint, (i) A = and (ii) the partial derivative of if with respect to every nonzero component of (3 is zero. We 
analyze the critical paths based on the variational method and present the connection theorem that states that two 
curves touch tangentially at the breakpoint connecting them. 

In the third part, we study two paths of critical points connecting the origin and an ordinary least squares (OLS) 
solution: a main path and a greedy path. A main path starts from an OLS solution and the active indices become 
inactive at breakpoints one by one. A greedy path, on the other hand, starts from the origin and indices become 
active at breakpoints one by one. A simple modification can make the greedy path coincide with the main path. 
Part of the greedy path, on which the Hessian matrix is positive semidefinite, can be constructed with a generalized 
Minkowskian gradient. Both paths are composed of a union of critical paths, and hence are piecewise smooth 
curves. The breakpoints of the greedy path coincide exactly with the step-by-step solutions generated by orthogonal 
matching pursuit (OMP) and thus, bridge 0MP[17] and ^p -regularized least squares problems. This link is more 
direct than the one between OMP and the ii minimization estabUshed in [7]. 

II. Global Solution Paths 

In this section, we study the solution paths of {Vc) and (£^) with c and A in (2) and (3), respectively, sliding 
from zero to infinity. We refer to the paths simply as the (7^c)-path and (£^)-path. It is readily verified that 

(^(/3) = i(/3-/3*)TG(/3-/3*) + 7, /3 G M", (6) 

-T- 11 11 9 T 

where G := XX , 7 := \\y\\2 — P* G(3* is a constant in /3, and 

P* :=[(3l,(3*,... ,(3*^" eV*:= argmin ^{(3) (7) 



is an OLS solution. In particular, /3* := {X'^)'^y with the Moore-Penrose pseudo-inverse {X'^)'^ has the minimum 
norm among all the OLS solutions. 

A. Global Minimum 

We denote by /3* and /3^ the global minima of {Vc) and (£^) for given c and A, respectively. In the case of 
p > 1, the following facts are well-known. 



Fact 1 (For p > 1). 

(a) i'Pc) and (C^) are convex problems. 

(b) The {Vc)-path is unique. 

(c) The {C^^)-path is unique. 

(d) The correspondence between the solutions of (Vc) and (C^) is one to one, and A is a continuous, 
monotone-decreasing, and single-valued function of c £ (0, c*), where c* is the minimum value of Fp 
among all the OLS solutions. 

In the present case of p < 1, however, there are remarkable differences between the two problems. 

Fact 2 (For p < 1). (Vc) and (>C^) are nonconvex problems and local minima exist in general. 

Theorem 1 (Relation between {C\) and {Vc) paths). For p < I, the {C^-^-path is a proper subset of the {Vc)-path. 

Theorem 1 is the main result of this section, and it indicates an intrinsic difference between {Vc) and {C\). 
Before proving it, we present a very simple example to facilitate understanding of the theorem. 

Example 1 (Global solution paths for ID case). Consider the following one-dimensional problem (p = 0.5j.' 
(p{/3) := ^{P ~ 1)^ and -Fo.5(/5) := 2|/3|'^'^. It is clear that the solution (3* of {V^!'^) continuously changes from 
P = to the minimum (3* = 1 of ^{(3) as c increases from c = to c* := Fq^^{/3*) = 2, and stays at (3 = (3* 
as it increases beyond c*. The solution path of {V^'^) is, therefore, the interval [0, 1] (see Fig. 1). In contrast, the 
solution of {C^^) changes discontinuously at the origin, as will be shown below. 

Figure 2 illustrates the graphs of the cost function fx{P) in (3) for different values of A. Looking at the red 
curve, which corresponds to X = 0.3, we can see that there is a pair of local minima (the first one at the origin and 
the second one between 0.5 and 1) and a single local maximum between and 0.5. As A decreases from A = 0.3 
gradually to zero, the second local minimum approaches (3*{= 1) and the local maximum approaches the origin 
while the first local minimum stays at the origin. Increasing A, on the other hand, the second local minimum and 
the local maximum approach each other and merge into a single infiection point at X = 0.385 (the green curve). 
As X increases beyond 0.385, fx becomes a monotonically increasing function over [0,oo) (the blue curve which 
corresponds to X = 0.5). Therefore, there is a single local minimum at the origin, which is a sole critical point, 
and no local maximum for X > 0.385. Let us now consider how the solution (i.e., the global minimum) changes 
depending on X. Starting from a large X, we decrease it gradually. The solution stays at the origin until X = 0.385. 
For X slightly smaller than 0.385, the global minimum still stays at the origin, since the value at the origin (the 
first local minimum) is smaller than the one at the second local minimum, as in the case of X = 0.3. However, 
as X decreases further, fx{(3) at the second local minimum decreases, while /a(0) = 0.5 for any A > 0. The 
second local minimum eventually becomes a global minimum at some value, say Agi, between 0.2 and 0.3. This 
implies that the solution of {C^^) jumps from (3 = to (3gi £ M, which is a global minimum of /a^,, satisfying 
/Agi(O) = /Agi(/3gi). As X decreases from Agi to zero, the solution changes from /3gi to /3*(= 1). The solution path 
of {C^^) thus consists of disjoint sets {0} U [/3gi, [3*] (see Fig. 3). Figure 3 will be discussed later in Example 2. 

B. Local Optimality in {Vc) and (£^) 

Apart from the global minimum, let us examine the conditions for local minimality in {VE) and (>C^). Lemma 
1 below shows that {Vc) and (£^) have different local-minimality characteristics. In {Vc), a point /3 is a local 
minimum when the function tp is locally minimal over the (nonconvex) constraint set Be := {/3 € R" : Fp{(3) < c}. 
In {C^), on the other hand, a point /3 is a local minimum when the function tp + XFp is locally minimal over the 
whole Euclidean space M". In short, local minimality in {Vc) is defined as that of the convex function over the 
nonconvex constraint set Be, whereas local minimality in (£^ ) is defined as that of the nonconvex function without 
any constraint. This makes an essential difference between the local minimality conditions for {Vc) and (>C^). 

We can geometrically describe local minimality of a point /3 in {Vc) as follows. Let TZ denote the contour of 
the function ip passing through the point (3. Also, let dBc denote the boundary of Be for c := Fp{0). Suppose 
for simplicity that there exists a unique OLS solution (3* := (X^)l'y; i.e., Lp is strictly convex and the problem is 
overdetermined. To exclude trivial cases, we will assume that (3* (the center of TZ) is located outside the constraint 
set Be. Suppose that has no zero components. In this case, ^ is a local minimum if (i) the two surfaces TZ and 
dBc touch each other (i.e., share the same tangent plane) at 0, and (ii) dBc is closer to the tangent plane than IZ in 



the vicinity of JS (see Fig. 4). In the case that /3 has some zero components, the above geometric properties hold 
in the subspace where zero-components of (3 are fixed to zero. 

Given any vector /3 := [/3i,/32,--- , (3n]^ ^ ^"' we define the set of its active indices as supp(/3) := {i G 
{1,2,- •• ,n} : /3.j / 0}. Let X := {ii,i2,--- ,is} '■= supp(/3), where s := |supp(/3)| means the cardinality 
of supp(/3); i.e., /3 is supposed to have s nonzero entries /3jj, f3i^, • • • , fii, 7^ 0. Define a sub-vector (3x '■= 
[/3ij , /3j2 , • • • , /3iJ^ € M* of /3 consisting of its nonzero components. We denote by Vi the gradient in terms of the 
nonzero components; e.g., 

where the simplified notation di is used rather than d/d/3i, to denote the partial derivative with respect to /Sj. The 
first and second derivatives of 'ipp{f3){:= - |/3|^) at a point /3 / are, respectively, given by 

V^;(/3) = sgn(/3)|/3r(i-P), (8) 

<(^)= -(i-p)i/3r(2-p)^ (9) 

where sgn(-) is the signum function. The following lemma presents necessary and sufficient conditions for local 
minimality in {Vc) and (£^). 

Lemma 1 (Necessary and sufficient conditions for local minimality in (Vc) and (>C^)). 

1) A vector /3 is a local minimum of (/3^) if, and only if (i) it satisfies the first-order condition, 

Vxifi^) = -XViFp0), (10) 

where I := supp(/9), and (ii) the Hessian matrix, 

K0):=VxVx{ip + XFp)0) (11) 

is positive semidefinite. 

2) A vector (5 is a local minimum of iVc) if and only if, (i) it satisfies the first-order condition, 

Vx^iP) = -XcVxFpiP) (12) 

for some Xc > 0, where I := supp{(3), and (ii) the Hessian matrix K{(3) with X := Xc is either positive 
semidefinite (for all vectors) or positive definite for any tangent vector e of the contour of Fp passing through 
P; i.e., x'^K{p)x > for all x G RI^I, or e^K{P)e > for all e ^ satisfying VxFp(/3)"^e = 0. 

Proof: Lemma 1.1 is clear. We prove Lemma 1.2 as follows. Although the statement is true for an arbitrary X, we 
only provide a proof for the case that X={1,2,--- ,n}. We drop the index X for simplicity. The first part is a 
condition for /3 to be a critical point. Noting that every local minimum, say /3, satisfies Fp{0) < Fp{(3*), /3 is a 
local minimum if, and only if, there exists a 5 > such that 

ifiP + A/3) > ifip) (13) 

for any A/3 G M" satisfying 

Fp0 + Af3) = Fp0), (14) 

IIA/3II2 < 6. (15) 

For a sufficiently small 6 > 0, Taylor expansions of 93 and Fp are, respectively, given by 

^{p + A/3) - ^0) = V(^0)TA/3 + iA/3T VV(/.(^)A/3, (16) 

Fp0 + A/3) - FpiP) = VFpiPfAp + ^Ap''VVFp{P)Ap, (17) 

where higher order terms have been neglected, and A/3 can be decomposed from (14) as 

Ap = ue + an, 1/ > 0, (0 <)a = 0(1/), (18) 



where e and n denote a tangent vector and a normal vector of the contour of Fp passing through 0, respectively. 
From (12), (14), and (17), we obtain 

V<^(^)'^A/3 = -AeVFp(^)"^A/3 = ^A^TvVFp(^)A/3, (19) 

which yields, together with (16) and (18), 

ifCfi + A/3) - ^0) = ^Af3''K{P)Ap = y e"^K(^)e + uae''KiP)n + ^rJKiP)n . (20) 

^ V ' 

This proves the second part. A proof for an arbitrary Z can be obtained by noting that, due to (14), the norm of 
A(3x, where X := {1, 2, • • • ,n}\ X, diminishes quickly as 6 approaches zero. □ 

Remark 1 (Difference between (Vc) and (C^) in terms of local minimality in Lemma 1). The positive semidefinite- 
ness of the Hessian matrix K[(5) is a necessary and sufficient condition for (C^), whereas it is only sufficient for 
iVc). It is, therefore, possible that a vector (3 is a local minimum of iVc), but a saddle of (C^), as will be shown 
in Example 3 in Section III-B. Indeed, the RHS of (20) is positive for a sufficiently small u > if e^ K{0)e > 
(even if e^ K0)n and nJ K0)n are negative); i.e., K{P) is allowed to be not positive semidefinite for a normal 
vector n. 

C. Proof of Theorem 1 

It is not difficult to see that the (£^)-path is a subset of the ("Pc )-path. The properness is verified by the following 
lemma derived from Lemma 1. 

Lemma 2. For any p G (0, 1), 

(a) {Vc)-path is continuous at f3 = 0; 

(b) {C^)-path is discontinuous at jS = 0. 

n 



Proof of Lemma 2: 

Proof of (a): It is clear that /3(0) = since {(3 : Fp{(3) < 0} = {0} and that ||/3(c) - /9(0)||2 = \\P{c)\\2 ^ as 

c — )• 0, implying the continuity of the ("Pc )-path at the origin. 

Proof of (b): Notice that ip is differentiable over M". The function Fp{(3) can be expressed as Fp{(3) = XlILi '^(f^i)' 

where ijj{l3) := - \f3f for /3 e M. It can be verified that lim/3|o ^V'(/3) = oo and lim/3|o ^V'(/3) = -oo. This 

implies that /3 = is a local minimum of the function ^{(3) + XFp{P) for any A > and no local minima exist 

in a neighborhood of /3 = 0. Thus, the (£^)-path is discontinuous at /3 = 0. □ 

III. Paths of Critical Point 

Section II showed that the (£^)-path is always discontinuous and is different from the ('P^)-path, which is 
continuous at /3 = 0. It is beneficial to extend LARS to the nonconvex case of p < 1 in such a way that the path 
is continuous. Here, we extend the criterion from one of minimality to one of criticality for the two problems, and 
consider continuous paths of critical points. Although we denoted the dependency of A on c by Ac in (12), we will 
denote it by A(c) when viewing A as a function of c. Similarly, we use the notation c(A). 

A. Critical point 

The definition of critical points is as follows. 

Definition 1 (Critical point). When (5 G M" satisfies the first-order condition 

Vx^m = -~XVxFp0) (21) 

for some A > 0, where X := supp(/3), it is called a critical point of iVc) for c := Fp{P), or a critical point of 
{Cl)for\:=~\. 



Note that condition (21) can be expressed as follows: 



diFpiP) W{(3i) 



A, Vi G X, 3A > 0. (22) 



Geometrically speaking, /3 is a critical point when the two surfaces TZ and dBc (see Section II-B) share the same 
tangent plane at /3. At a critical point /3, the function ip takes a critical value over Be for c := Fp{(3), and, at the 
same time, the function Lp + XFp takes a critical value over M". 

Proposition 1. The following statements hold. 

1) A critical point of iVc) for any c> is a critical point of (>C^) for some A > 0. 

2) A critical point of (C^) for any X> is a critical point of iVc) for some c > 0. 

In the rest of this section, we consider problems {Vc) and (£^) rather than {Vc) and (>C^). 

B. Critical path 

The set of critical points for {Vc), which is the same as that for (>C^), is given as 

C := |/3 G M" : there exists A > s.t. (21) holdsj . (23) 

Some important observations are listed below. 

1) A local minimum of (£^) is a local minimum of {Vc), but the converse is not true. 

2) The correspondence between c and A(c) has multiplicity. 

3) The paths of the global minima of {Vc) and (£^) are both subsets of C. 

4) The path of the global minima of (£^) is always discontinuous. 

5) The path of the global minima of {Vc) is possibly discontinuous (see Example A.2 in the appendix). 
Each critical point (3 is associated with a certain value of c (= Fp{(3)), and in general, there are multiple critical 

points that are associated with a single value of c. It is clear that the origin is a unique critical point associated with 
c = 0.^ As c increases from zero, the multiple critical points associated with each value of c draw multiple curves 
in M". We call each such curve a critical path of {Vc), which is defined formally below. Intuitively, a critical path 
is a maximal continuous curve that is a single-valued function of c (or A). 

Definition 2 (Critical path). 

1) A subset C C C is called a critical path of {Vc), or a {Vc) path for short, if (i) the mapping T : C — )• S C 
[0,oo), /3 I— 7- c = Fp{l3) has a one-to-one continuous inverse mapping T~^, and (ii) none of the proper 
supersets of C satisfies condition (i). 

2) A subset C C C is called a critical path of {C^), or a {C^) path for short, if (i) the mapping T : C ^ 
S C [0, oo), I—)- 0, /3(7^ 0) I— )■ A = —diLp{j3)/diFp{f3), i G supp(/3), has a one-to-one continuous inverse 
mapping T~^, and (ii) none of the proper supersets of C satisfies condition (i). 

Typical examples of critical paths are given below to give the reader an intuitive understanding before the general 
analysis of critical paths. 

Example 2 (Critical paths for ID case). Consider the critical paths for the functions considered in Example 1. The 
function f\ has possibly three critical points: /? = for any A > and points /3a included in the set 

Rx{n = |/3 > : /;(/?) = /3 - /3* + A = o| (24) 

when Rx{P*) / 0. It can be verified that \Rx{f3*)\=2forX<2 {(3* /2>)^'^ (see Fig. 3). When three critical points 
exist for a A, one is /3 = 0. The larger element of R\{I3*) and /? = are the local/global minima of f\; the other 
one is the local maximum, as illustrated in Fig. 3. While the (>C^'^) global path is the disjoint set {0} U [/3gi,/3*] 
for a f3gi, the {C^^) critical paths are two intervals: [0,/3cr] and [/3cr , f3*] for a /3cr- In contrast, the {V^'^) critical 
path coincides with the {V^!'^) global path [0, 1] in this case, although this is not always true. 

^In this case, the set of active indices X is an empty set, and hence, the condition is automatically satisfied. 



Example 3 (2D orthogonal case). Consider the following two-dimensional case under the orthogonality condition 

G := XX'^ = I: ^{(3) := \ \\(3 - I3*\\l with (3* := [2, 1]"^ and Fq,^{(3) := 2(|/3i|0-5 + \^2\^-^)- In this special 
case, fx{j3) can be decomposed as 

fx{(3) = fx,i{(3i) + fx,2m, (25) 

where /a,i(/3) := ^{P - Pif + 2A |/3|°-^ and /a,2(^) ■= IW - 1^2? + 2A |;S|°-^ Figure 5 plots the critical points 
Pi^x G i?A(/3i) U {0} and f32,\ £ Rxil^o) ^ i^} (^^ a function of A. Recalling Example 2, fx^i and /a, 2 each have 
three critical points within a certain range of A, and they each form three branches in Fig. 5: Al, Bl, and CI 
for fx^i and A2, B2, and C2 for fx^2- Note that fx,iiPi) cind fx,2{fi2) ci^^ coupled through a common A. Given a 
small A, there are 9(= 3x3) ways of choosing the pair of critical points {Pi^Xih.x) from any pair of branches, 
(A1,A2), (A1,B2), (A1,C2), (B1,A2), ■■■, (CI,C2). Each pair forms a {C\^)path although (C1,C2) is trivial as it 
corresponds to the origin. Excluding the trivial one, there are eight other (C^^) paths (see Fig. 6(a)). 

For instance, let us start from X = in Fig. 5 and trace a critical path from the origin in Fig. 6(a). We increase 
A and follow the branches Bl and C2 until we reach the edge of Bl at which Al and Bl are connected. This 
corresponds to the blue dotted line (labeled by B1C2) in Fig. 6(a). Each point on the (B1,C2) path is a saddle 
point of fx, since /3i a is a local maximum o//a,i and /32,a = is a local minimum o//a,2- From the edge ofBl, 
we follow the branches Al and C2 by decreasing A down to zero. This corresponds to tracing the blue solid line 
(labeled by A1C2) from the triangle. Each point on the (A1,C2) path is a local minimum of fx since both /3i a '^nd 
/32,A = are local minima of /a,i and /a,2> respectively. In an analogous way, one can associate every critical 
path with a pair of branches in Fig. 5. Note that only the pair (B1,B2) gives local maxima of fx- 

The union of the four paths (B1,C2), (A1,C2), (A1,B2), and (A1,A2) of (CP^^) forms a (P^'^) path that starts 
from the origin and reaches the OLS solution (3* through the breakpoint [2, 0]^. Fig. 6(b) depicts each ("P^'^) path 
in Fig. 6(a) as a function of c. It can be seen that c increases monotonically along any of the paths. A question now 
is how A(c) changes with c along the paths; this is depicted in Fig. 6(c). It can be seen that A(c) is non-monotonic 
in c and the correspondence between c and A(c) has multiplicity. Note that the points marked by triangles along 
the paths in Figs. 6(a), (b) correspond to the peaks in Fig. 6(c) at which A(c) := ^A(c) = and a change from 
a local maximum to a local minimum in {C^^) occurs. Regarding the global solution paths, the ("P^^) path is the 
whole blue curve in Fig. 6(a), while the (^C^'^) path consists of three disjoint sets {0}, a subset of the (A1,C2) 
path, and a subset of the (A1,A2) path (cf [18]). The parameter c is a monotonically decreasing and discontinuous 
function of A. 

In the non-orthogonal case, critical paths similar to the case of Example 3 are obtained although the function fx 
cannot be separated as in (25). (See Example A. 1 in the appendix.) 

C Analysis 

Let us analyze critical paths of (Vc), while taking the critical point /3(c) to be a function of c > 0. How does 
/3(c) behave as c changes? The behavior can be described by a differential equation governing the tangent direction 
/3(c) := ^/3(c) of the path /3(c). Let P := f3{c) and A := A(c) in (21), and let us differentiate both sides with 
respect to c. After simple manipulations, we obtain the equation of the critical path: 

K(/3(c))/3x(c) = -A(c)ViFp(/3(c)) = Mvx(p(/3(c)). (26) 

One needs to carefully study those points at which the following situations occur. 

1) The matrix K{(3{c)) is singular. 

2) A(c) = 0. 

3) /3(c) is a breakpoint where the support of /3(c) changes. 

In Fig. 6(a), the triangle indicates a separation point /3(c') of (C^^); the smooth part of the path separates 
into a pair of {C^^) paths. Viewing Fig. 6(c), one can see that A(c') = holds at every separation point /3(c') 
of {C^x^) paths. The matrix K(P{c')) is also singular with /3(c') being its eigenvector associated with the zero 
eigenvalue, since VxFp(/3(c')) is bounded and Pxic') 7^ 0. The situations described in items 1) and 2) above 
happen simultaneously, as shown by the following theorem. 



Theorem 2 (On singular points). On a iVc) path excluding its breakpoints and edges, the following two statements 
are equivalent if there is no other iVc) path passing through the point (3{c'): 

(a) A(c') = 0; 

(b) K{(3{c')) is singular. 

Proof: It has already been seen above that (a) =^ (b). Assume that A(c') / 0. Suppose that K{(3{c')) is singular. 
Then, since W xFp{(3{d)) / 0, there is no I3x{c) satisfying (26), or there are infinitely many /3x(c) satisfying (26) 
and the set of such l3x{c)^ forms a linear variety which is unbounded. This implies that the path is discontinuous. 
Hence, K{(3{c')) should be nonsingular. Indeed, the nonsingularity of K{l3{c')) ensures the existence of a unique 
vector /9x(c) that satisfies (26). This verifies that (b) =^ (a). □ 

Now consider a situation in which we follow a critical path towards a breakpoint with /3i > approaching zero; 
e.g., follow the (B1,A2) path towards the breakpoint [0, 1]^ in Fig. 6(a). A simple inspection of (21) suggests that 
A = and diipifi) = for all i G X\ {1} at the breakpoint, since diFpifi) -> oo as ^i t 0, diFp(^) < oo, 
Vi G X\ {1}, and dnp{P) < oo, Vi G X. To analyze this situation in more detail, we will study the first component 
in (26): 

^gijjic) - (1 -p)A(c)/3-('-^)(c)/3i(c) = -A(c)/3-(^-^)(c), (27) 

where it is assumed for simplicity that /3i(c) > 0. Multiplying both sides of (27) by /3]^~^(c) and letting /3i(c) — )• 
yield 

(1-P)|44 = ^ ^ (l-p)-f log/3i(c) = -f logA(c) ^ /3^^(c)«A(c). (28) 

Pi(cj A(cj ac ac 

It is readily verified that 

/3i(c) « A(c)A^(c) (29) 

P 

for 77 := . Meanwhile, it holds that 

1-p 



dM(3) = -X{c)d,Fp{f3) = -A(c)sgn(/3, 



^-(i~.) 



Vi G X. (30) 



Let /3bj^ denote a breakpoint with its support X and i' G X an index that becomes active at P^^. Then, we can 
verify the following theorem from (28) - (30). 

Theorem 3 (Properties of breakpoints). At any breakpoint /3bj^ = /3(cbr), it holds that 

1) A(cbr) = 0; 

2) 9i(^(/3BR) = 0, iGX; 
dM(^BK) / 0, j G X 

Moreover, I3i>{c) -^ as /3(c) with supp(/3(c)) = XU {i'} approaches the breakpoint /3br. 

Theorem 3.1 (cf. Fact 1) immediately yields the following corollary. 

Corollary 1 (Multiplicity and non-monotonicity of the c - A correspondence for < p < 1). Consider the 
correspondence between c and A over a path connecting the origin and an OLS solution. ^ Then, the following 
statements hold. 

1) c(A) is a multi-valued function of X > 0. 

2) A(c) is a non-monotonic function of c> 0. 

Corollary 1.1 states that, given a A value, there are multiple critical points P^ that have different values of c(A) := 
FpiPx)- From Theorem 3.2, one can verify the following: 

• Every breakpoint is the best, in the sense of minimizing ip, among all points having the same support."* 

^The path may not be a single critical path but could be composed of a union of multiple critical paths. 

''Some readers may think that Theorem 3.1 means breakpoints can be obtained by solving (Lq). This is, however, not true because the 
solution of (£q) is clearly an OLS solution for any p > 0. 
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• Any solution of (£^) for any A > is not the best, in the sense of minimizing 93, among all points having 

the same support as the solution itself. 
Finally, we present the connection theorem at breakpoints. Let M denote the coordinate plane associated with 
XU {i'} and A/-, (c M) the coordinate plane associated with T. On Mp, the critical-path equation is given by 

Y,9^M'^) - (1 -p)A(c)/5r(2-p)(c)/5,(c) = _A(c)/3r(i-P)(c), i g X, (31) 

which is identical to the critical-path equation on M with Pi' (c) = 0. This leads us to the following theorem. 

Theorem 4 (Connection theorem at breakpoints). Suppose that two smooth curves of critical points are connected 
at a breakpoint. Then, the curves touch tangentially at the breakpoint. 

IV. Greedy Path and Its Link to OMP 

In this section, we consider two continuous paths of critical points, a main path and a greedy path, in the 
overdetermined case. 

A. Main Path and Greedy Path 

The main path is a continuous curve from the OLS solution (3* to the origin; e.g., the blue curves in Figs. 6(a) 
and 7, and the union of the green, red, and blue curves in Fig. 8(a) (see the appendix). To be precise, the main 
path is defined as follows. 

Definition 3 (Main path). 

1) A main path starts from the f3* (the initial active-index set is Xq := {1,2, • • • ,n} generically) and follows 
the critical-path equation (26). 

2) If it reaches a breakpoint where some variable, say j3i<., becomes zero, then the path follows (26) with the 
updated active-index set X\ := Xq \ {i*}. 

3) If it reaches the next breakpoint where another variable, say Pj*, becomes zero, then the path follows (26) 

w/f/2X2:=Xl\{j*}=Xo\{i^jn■ 

4) Repeat the same procedure until the path reaches the origin. 

On the other hand, a greedy path is a continuous curve which starts at the origin and possibly ends at (3* . It is 
an extension of the LARS path to the case of p < 1 and provides a remarkable link between the £p-regularized 
least squares and OMP. The greedy path is defined as follows. 

Definition 4 (Greedy path). 

1) A greedy path starts from the origin and follows the critical-path equation (26) with Xq := {z*} for i* G 
argmaxj^;^ 2 ••• n l^i'/'(0)l- ^^ t^^ origin, (26) suggests the direction^ [0, • • • ,0, — (9j*(^(0), 0, • • • , 0]"'". 

2) Once it reaches a breakpoint /Sgj^ where di^(p{P^^) = 0, the path follows (26) with the updated active-index 
set Xi := Xo U {f} = {i*,f} for f G argmax^-^i 2,-,n I^jV'I/^br)!- 

3) Once it reaches the next breakpoint /3bj^ where di*ip{P^Yi) — '^j*^{(^br) — 0- ^'^^ path follows (26) with 
X2 :=XiU{/c*} = {i*,f,k*}for k* G argmax^^i 2,-,n l^fc'/'l/^lR)!- 

4) Repeat the same procedure until the path reaches (5*. (The path would stop if some variable became zero 
accidentally.) 

Suppose, in the first step of the greedy path, that (26) suggests an undesirable direction in the sense that the 
path leads to the opposite side from (3* with respect to its i*th component. Such an i* could be excluded from the 
active-index selection, since the path cannot reach (3* without getting /3j* back to zero. We thus define the modified 
greedy path as follows. 

^This is because (i) A(c) > 0, (ii) A(c) > in the vicinity of the origin since A(c) — >■ as /3 — >■ 0, and (iii) K{(3) — >■ —00 as /3 
approaches along some coordinate. Indeed, as /3 approaches along some coordinate, A(c) -^ k £ (0, 00) and K{f3)\(c) — >■ —00 so 
that /3j/(c) -^ 0, i' e supp(/3) (see Theorem 3.1). 
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Definition 5 (Modified Greedy Path). In the first step to finding the greedy path, let i* G argraax^^j \dnp{0)\, 
where ^q := {i = 1,2,- ■ ■ ,n : diip{0)f3* < 0}. In the second step, let j* G argmaxjgj-^ |5i99(/3Bj^)|, where 
^1 := {i = 1, 2, • • • , n : diip{P^^)f3* < 0}. The same applies to the subsequent steps. 

For a fixed X, both the main and greedy paths are smooth because their directions are governed by (26). The way 
of selecting active indices in the greedy path will be validated in Section IV-B by using a generalized Minkowskian 
gradient. In Examples 2, 3, A.l, A.2, the greedy paths coincide with the main paths, and those in Examples 3, A.l, 
A.2 are homeomorphic with each other. Note that c is not necessarily monotonic along the main/greedy path (see 
Example A.2). A particular case in which the modification is required for the greedy path is Example A.3 in the 
appendix. 

Important observations regarding the relation between the four paths (global solution path, critical paths, main 
path, and greedy path) are summarized below. 

Observation 1. 

1) Generically, there is a unique main path and a unique greedy path.^ 

2) The main path, greedy path, and global solution path are subsets of C. 

3) The main path (the greedy path) is composed of a union of multiple (>C^) paths. 

4) The main path (the greedy path) is either a single (Vc) path or composed of a union of multiple iVc) paths 
(see Example A.l in the appendix). 

5) When G = I, the main and global solution paths coincide with each other, or otherwise the main path 
includes the global solution path as its subset (see Example A.l). 

Remark 2 (Underdetermined case). In the underdetermined case, there are infinitely many OLS solutions. The main 
path can still be defined as the one starting from a sparsest OLS solution (3*. In this case, however, it is not useful 
for solving a sparse optimization problem because its starting point is a solution of the problem. The greedy path 
is, however, useful. The minimum-norm OLS solution (3* := [X )^y can be used to determine the modification 
process. 

B. Generalized Minkowskian Gradient and Greedy Path 

We show that part of the greedy path can be constructed with a generalized Minkowskian gradient. See [19] for 
a study of the Minkowskian gradient for sparse optimization with p = I, which encompasses non-quadratic convex 
objective-functions. To define a generalized Minkowskian gradient, we introduce a pseudo-norm below. 

Definition 6. Given any vector (3 £ M" with supp(/3) = X such that the Hessian matrix K{j3) is positive definite, 
we define the pseudo-norm of a vector a G M", depending on the position (3, by 

Qisia) := ^a]-K{/3)ax + - J] ^p{ai). (32) 

i€X 

Definition 7. Given any vector (3 G M" such that K[(3) is positive definite, the generalized Minkowskian gradient 
of ip{(3) is defined as follows: 

VgmV3(/3) := argmaxa"'"V(/?(/3). (33) 

Qf>{a)=l 

Lemma 3 (Generalized Minkowskian gradient at /3 = 0). The generalized Minkowskian gradient at the origin is 
given by 

{ -sgn(ai*c^(0)), i* G argmax \d,ip{0)\ , 
[Vgm¥'(0)]» = <^ .=1,2,.. ,n Vi = l,2,--- ,n. (34) 

*In an exceptional case, for instance, in which /3* :— [1, 1]^, G :— I, p = 0.5, the main path starts at /3* in the direction towards the 
origin and splits into three paths: one goes to the origin straightly and the others respectively go to the origin via the breakpoints [0, 1]^ and 
[1, 0]^ due to symmetry. 
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Proof: The pseudo-norm Qo coincides with the ip quasi-norm, and the generalized Minkowskian gradient is 
equivalent to the Minkowskian gradient for p = 1, as stated in (34). This can readily be verified by the concavity 
of the £p quasi-norm in each orthant. □ 

Lemma 4 (Generalized Minkowskian gradient at /3, /3j 7^ 0, Vi = 1, 2, • • • , n). For (3 with /3j 7^ 0, Vi = 1, 2, • • • , n, 
the generalized Minkowskian gradient is given by 

Voum = , ^"^^)^^^^) oc K-\p)Vm. (35) 

^V^{f3yK-\f3)Vip{f3) 

Proof: The claim is readily verified with a Lagrange multiplier. 

Lemma 5 (Generalized Minkowskian gradient at a general (3). 

1) Let Vjv3(/3) / 0. Then, 

^gmM(3) = I ^ ^iP)^MP) ^ K-\(3)Vx^i(3), (36) 

[VoM^mi = 0, iGl. (37) 

2) Let Vxc^(/3) = 0; i.e., let j3 be a breakpoint. Then, 

Vgm,i¥'(/3) = 0, (38) 

r -sgn(ai.(^(/3)), i* G argmax |9,(/7(/3)| , 
[VGM^(/3)]i= <^ .=1,2,.., n ViGX. (39) 

Proof: The pseudo-norm Q 13(a) is a first-order function of Oj for i G X while it is a pth order function of Oj for 
I G X. Since p < 1, in order to maximize a^Vc^(/3) = aJV2(/7(/3) + aTVj(/9(/3), all resources should be allocated 
to ai, if Vj(/3(/3) 7^ 0, and to a^, if Vj(/7(/3) = 0. This verifies the claim. □ 

Lemmas 3-5 lead to the following theorem. 

Theorem 5. The direction vector of the greedy path is given by — Vgm9'(/3) at any point P where K{(3) is positive 
definite, including the origin and all the breakpoints. 

Note here that A(c) < in (26) when K{[3) is positive definite (cf. Theorem 2). Note also that, when K{f3) has a 
negative eigenvalue, the direction vector of the greedy path^on the coordinate plane associated with the active-index 
set X is given by i^~^(/3)Vx(/j(/3), rather than —K~^{(3)\'xy^{P). This is because the direction vector in this 
case is /9(c) and A(c) > if c increases along the greedy path, while the direction vector is —0{c) and A(c) < 
if c decreases. Special care is therefore required at those points where the Hessian matrix K{(3) is singular. 

C. Link Between ip-Regularized Least Squares and OMP 

The following proposition immediately follows from the definition of the greedy path. 

Theorem 6 (Link between OMP and the Ip regularized least squares). Suppose that the (unmodified) greedy path 
continues to an OLS solution. Then, the breakpoints of the greedy path coincide with the step-by-step solutions 
generated by OMP. 

Corollary 2 (Link between OMP and (-C^))- Suppose that the (unmodified) greedy path continues to an OLS 
solution. Each step-by-step solution generated by OMP is the limit of a convergent sequence of critical points of 
{CD as \^Q. 

Proof: The claim is readily verified using Theorems 3.1 and 6. □ 

The link between OMP and the ^^-regularized least squares presented in Theorem 6 is more direct than the one 
between OMP and li minimization. Theorem 6 naturally leads us to the modified OMP algorithm below. 

Algorithm 1 (Modified OMP Algorithm). Compute the breakpoints of the modified greedy path one by one in the 
same way as OMP; i.e., minimize ip, at each step, in terms of active variables with inactive variables being zero. 
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Let us have a fresh look at Example 3. It is easily verified that the step-by-step solutions of OMP in the example 
are /^f^^ := [2,0]"^ and /J^^^ := [2,1]"^. One can see that f3^^^ is the breakpoint of the greedy path B1C2 
- A1C2 - A1B2 - A1A2 (the blue curve in Fig. 6(a)), and (3^^^ is the end point of A1A2, which is the OLS 
solution. This clearly demonstrates Theorem 6. 

V. Conclusion 

This paper investigated the least squares problem by making two different formulations involving £p-regularization 
(0 < p < 1): ^p-constrained least squares (Vc) and £p -penalized least squares (/2^). The key findings are summarized 
as follows. 

1) The essential difference between (Vc) and (-C^): the (£^)-path is a proper subset of the ("Pc )-path (Theorem 
1). The two problems are also different in terms of their local minimality (Lemma 1). 

2) Discontinuity of the solution paths: the (£^) solution paths are always discontinuous, whereas the (Vc) 
solution paths are possibly discontinuous (Lemma 2 and Example A.2). This is due to the nonconvexity of 
the £p quasi-norm. 

3) Properties of breakpoints: A(c) = at any breakpoint (Theorem 3.1). Moreover, every breakpoint is the best, 
in the sense of minimizing tp, among all points having the same support (Theorem 3.2). Two smooth curves 
connected at a breakpoint touch tangentially (Theorem 4). 

4) Multiplicity (non-monotonicity) in the correspondence between the regularization parameters: multiple c values 
in (Vc) correspond to a single value of A in (£^) (Corollary 1). 

5) Greedy path and generalized Minkowskian gradient: the direction vector of the greedy path is given by the 
generalized Minkowskian gradient at any point where the Hessian matrix K{(3) is positive definite (Theorem 
5). 

6) The direct link between OMP and ^p-regularized least squares: the breakpoints of the greedy path coincide 
with OMP step-by-step solutions (Theorem 6). The link is more direct than that between OMP and ii 
minimization. 

It should be remarked that some parts of the greedy path are not covered by the theory presented in [13, 16]. 
Indeed, what is obtained by the existing approximate solvers for (£^) given some A > is a stable critical point 
of (>C^), which is not necessarily on the greedy path. The fundamental study on critical paths presented here will 
be a useful basis for making the output of an ^p-regularization-based approach more controllable. Developing a 
computational method to construct a main/greedy path will be an interesting future work. 

Appendix A 
Examples 

This appendix presents four examples: 
A.l Non-orthogonal case (G 7^ 7) for n = 2 and p = 0.5. This is a simple example of critical paths for a 

non-orthogonal case. 
A.2 Orthogonal case (G = /) for n = 2 and p = 0.7. This is a particular case in which (i) the (Vc) solution path 

is discontinuous, and (ii) c is non-monotonic along the main/greedy path. 
A.3 Non-orthogonal case for n = 3 and p = 0.5. This is a particular case in which a modification must be made 

to get the greedy path (see Definition 5). 
A.4 Orthogonal case for n = 5 and p = 0.5. This is an example of greedy paths for a higher dimensional system. 

Example A.l (2D non-orthogonal case). Consider the following example: (p{/3) := ^||/3 — /3*||q := ^(/3 — 

^0.5 1 
three {V^'^) paths; Fig. 7 shows the critical paths drawn in different colors. Unlike the case of G = I in Example 
3, the function fx cannot be separated as in (25) and, therefore, one should consider both variables /?i and (32 
together in order to find the critical points. In the general case of n > 2, the partial derivatives dif\{P) for 
i G supp(/3) depend on the other variables, and the condition for (3 to be a critical point is given by 

A + ai + A|/3/~^sgn(/3,) = 0, Vi G supp(/3), (A.l) 



P*yG{(3 - (3*) with P* := [2, 1]^, G :-- 



and Fo,5{l3) := 2(|/3i|°-^ + |/32r^)- In this case, there are 
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where 



Ui 



Oi (/?! , /^2 , • • • , A-i , A+i , • • • , /3n) := -/3; + J] gi,j (/?, - /3* ) . (A.2) 

Here, gij is the {i,j) component of G. 

Example A.2 (2D orthogonal case for p = 0.7). Consider the following case of n = 2: f{(3) := | ||/3 — /3*||2 with 
(3* := [2, 1] and Fp{(3) := -(|/3i|^ + |/32|^) /or p = 0.7. Although the only difference from Example 3 is the p 
value, it leads to significant differences as explained below. 

1) c is non-monotonic along the path connecting the origin and (3* (which will be referred to as the main path 
in Section TV), as illustrated in Fig. 8(b). ^ 

2) Because of the non-monotonicity of c, the path from the origin to (3* is separated into three {V^'"^) paths. 
One of the separation points is located at the breakpoint [2, 0]^, and the other one is located at the point 
where c starts to increase in Fig. 8(b). The separation points of the (T'c''^) pc^ths are indicated by squares in 
Figs. 8 and 9. 

3) From the breakpoint [2, 0]""^ to the OLS solution (3* = [2, 1]^, the local minimality in {V^'"^) changes at the 
separation point. All points on the blue curve in Fig. 8(a) are local minima in (P^^), while the red curve 
excluding the endpoints contains no local minima in ("P^^). See the discussion below item 5). 

4) Neither the red nor the blue curves (("Pc'^) paths) in Fig. 8(a) is composed of (>C^'^) paths. From [2, 0]^ to 
(3* = [2, 1]^, the two (>C^'^) paths are connected at the triangle where /3i starts to increase. The separation 
points of (>C^ ) paths are indicated by triangles in Figs. 8 and 9. The local minimality in {C^ ) changes at 
the separation point. See the discussion below item 5). 

5) The ("Pc'^) global path is discontinuous. This can be seen by observing that the minimum value of (p in 
Fig. 9(b) switches from the green curve to the blue one at the intersection of the two curves. The (P^^) 
global solution therefore jumps from the green curve to the blue one in Fig. 8(a). 

To discuss the local optimality of critical points (3 on the curve from the breakpoint [2,0] to the endpoint (3* = 
[2, 1]^ in Fig. 8(a), we will analyze the positive definiteness of the Hessian matrix K{(3) with Lemma 1. The 
matrix K{(3) is indeed not positive semidefinite from the breakpoint up to the separation point (triangle) of the 
(vC^'^) paths and is positive semidefinite from the separation point to the endpoint, and thus, item 4) above applies. 
However, between the two separation points (the square and the triangle on the curve), K(P) is positive definite 
for tangent vectors, leading to item 3) above. From Fig. 8(b), it is apparent that there are two critical points, off 
the Pi-coordinate, corresponding to some c value. Indeed, there is another critical point, on the /3i-coordinate, 
corresponding to such a c value. This implies that, given a surface dBc for some c, there exist three contours IZ of 
9?, touching dBc. In particular, one of the contours TZ, passing through a critical point (3 (on the red curve) very 
close to the j3i-coordinate, is closer to the tangent line than dBc in the vicinity of P, meaning that /3 is a local 
maximum in iV^''^). 

Example A.3 (3D non-orthogonal case). Consider the following three-dimensional case: (3* := [0.2,0.8, 1]^ G := 

1 -0.7 -0.6] 

0.7 1 —0.1 , and p = 0.5.^ In this case, V(/?(0) = [0.96,-0.56,-0.8] and, hence, (26) suggest the 

0.6 -0.1 1 J 

direction [—9i 93(0), 0,0]^ oc [—1,0,0]^ although f^l = 0.2 > 0. The (unmodified) greedy path is therefore located 
on the opposite side of the ^2-1^3 coordinate plane from f3*, and thus the modified greedy path selects another 
direction [0, 0, —d^ip^O)]^ oc [0, 0, 1]^. The modified greedy path leads to the OLS solution (3* via the breakpoints 
/^BR = [0,0,0.8]^ and /^gj^ ss [0,0.6465,0.8646]^. This is actually the main path. The unmodified greedy path 
passes through the breakpoints /flgfj := [—0.96,0,0]"'^ and /3br '■— [—0.75,0,0.35]"'", and then all the components 
become active, ending up with two active components /3i and /32 simultaneously going back to zero at [0, 0, 0.8] . 



^This is the case in which LARS requires the Lasso modification to obtain the Lasso solution path. 
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Example A.4 (5D orthogonal case). Consider the following case of dimension n = 5 under the orthogonality 
condition XX'^ = I: ip{l3) := \ ||/3-/3*||2 with /3* := [1,0.7,-0.5,0.3,-0.1]"^ and p = 0.5. The function fx 
can be separated as follows: 



hif3) = ^fx,m, (3e 



(A.3) 



j=i 



where fx^i{(3) := ^{f^ — /5*)^ + -A |/3|^, /3 € M. The critical-point condition for (C^) can also be written separately 
as follows (see Definition 1): 



Vx/a(/3) 



where I := {11,12, 






(A.4) 



J := supp(/3), f[ 



if\i^> <^f^d Oj, is the zero vector of length ig. 



The nonzero critical points for each individual function fx^i, are plotted in Fig. 10(a). (Note that zero is always 
a critical point for any fx^i^ and thus is omitted.) On each curve in Fig. 10(a), there are two points corresponding 
to each A. The one with a smaller absolute value is a local maximum and the one with a larger absolute value is 
a local minimum (see Fig. 2). The greedy path goes along the (3i coordinate until (3i = (3^ = 1. In Fig. 10(a), we 
can trace the blue curve from (0,0) to (0, 1), and in Fig. 10(b) we can trace the blue curve from (0,0) to (2, 1). 
(The variables j32, /^s, fi^, and j3^ stay at zero.) Next, the new entry j32 becomes active, and it increases from zero 
up to /32 = (^2 ~ O-^- ^'^ ^^^'^ case, we can trace the blue curve in Fig. 10(b) from (0,1) to the next peak and the 
green curve from (2.0) to its first peak. In Fig. 10(a), we can trace the blue curve from (0,1) until A reaches a 
point at which the function /a, 2 hcis its unique nonzero critical point (the peak of the green curve) and trace the 
same path in a reverse way back to (0, 1). Also in Fig. 10(a), we can trace the green curve from (0, 0) to (0, 0.7). 
(The variables P^, /34, and (3^ stay at zero meanwhile.) One can follow the same procedure to see the whole picture 
of the greedy path. It can be seen that the greedy path connects the origin and the OLS solution (3* continuously 
in this case. 

All the critical paths can be found in this way. For instance, let us consider another particular path on which 
all the variables become active when stepping slightly away from the origin. In this case, (5^ achieves its peak 
in Fig. 10(a) before the others and one cannot increase A any further. What one can do here is to reduce A. 
Accordingly, Pi, P2, Ps, cind (3^ can only go back to zero by tracing the same path in Fig. 10(a) in a reverse way, 
and only (5^ can trace the purple curve up to (0, —0.1). In this case, the whole path starts at the origin and ends 
at /3 := [0,0,0,0,-0.1]"'^; it consists of two {C^yf') paths because a critical path is a single valued function of c 
(or X) by definition. Along the path, c increases up to some point and then starts to decrease. Hence, the path is 
divided into two parts: the part containing the origin is a ("P^'^) path; the other part becomes another ("P^'^) path 
by extending it with a straight line to the origin along the (3^ coordinate. 
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Fig. 1. Relation between c and /3* in Example 1. 




Fig. 2. Graphs of /a(/3) := i(/3 - 1)^ + 2A|/3|' 
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Fig. 3. Relation between A and fi\ in Example 1. 
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Fig. 4. Case that /3 is a local minimum of CP^)- 



)^ 1 




Fig. 5. Critical points of /a,i(/^i) ''tid /a,2(/32) in Example 3. 
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Fig. 6. (a) Critical paths of ("P"'^) with its correspondence to (^C^'^), (b) critical point 0^ = [/3c,i, /3c,2]^ as a function of c, and (c) c 
A(c) correspondence in Example 3. 




Fig. 7. Critical paths of (P"*) in Example A.l. 
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Fig. 8. (a) The main path composed of a union of three ("Pc'^) paths and (b) non-monotonicity of c along the main path in Example A.2. 
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Fig. 10. (a) A - /? correspondences and (b) the greedy path for n = 5 in Example A.4. 



